Applying the Multiple Cause Mixture Model to Text Categorization

نویسندگان

  • Mehran Sahami
  • Marti A. Hearst
  • Eric Saund
چکیده

This paper introduces the use of the Multiple Cause Mixture Model to automatic text cat egory assignment Although much research has been done on text categorization this al gorithm is novel in that is unsupervised that is does not require pre labeled training ex amples and it can assign multiple category labels to documents In this paper we present very preliminary results of the application of this model to a standard test collection eval uating it in supervised mode in order to fa cilitate comparison with other methods and showing initial results of its use in unsuper vised mode

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parametric Mixture Models for Multi-Labeled Text

We propose probabilistic generative models, called parametric mixture models (PMMs), for multiclass, multi-labeled text categorization problem. Conventionally, the binary classification approach has been employed, in which whether or not text belongs to a category is judged by the binary classifier for every category. In contrast, our approach can simultaneously detect multiple categories of te...

متن کامل

Instance Label Prediction by Dirichlet Process Multiple Instance Learning

We propose a generative Bayesian model that predicts instance labels from weak (bag-level) supervision. We solve this problem by simultaneously modeling class distributions by Gaussian mixture models and inferring the class labels of positive bag instances that satisfy the multiple instance constraints. We employ Dirichlet process priors on mixture weights to automate model selection, and effic...

متن کامل

Development of a Multi-Classifier Approach for Multilingual Text Categorization

Research work related to applying text categorization methods to a monolingual corpus such as English text collections has been well established by several research teams in recent years. However, little attention has been paid to applying the techniques to classify the documents in multiple languages such as English and Chinese by means of a unified model. In this paper we propose a multi-clas...

متن کامل

Large margin multinomial mixture model for text categorization

In this paper, we present a novel discriminative training method for multinomial mixture models (MMM) in text categorization based on the principle of large margin. Under some approximation and relaxation conditions, large margin estimation (LME) of MMMs can be formulated as linear programming (LP) problems, which can be efficiently and reliably solved by many general optimization tools even fo...

متن کامل

Classifying Business Types from Twitter Posts Using Active Learning

Today, many companies have adopted Twitter as an additional marketing medium to advertise and promote their business activities. One possible solution for organizing a large number of posts is to classify them into a predefined category of business types. Applying normal text categorization technique on Twitter is ineffective due to the short-length (140-character limit) characteristic of each ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996